Introduction and Background

Images constitute an important part of semi-structured data collection. But what to do with the collected set of images? How to understand, process and analyze them for meaningful insight?

Hence, here, we journey to explore provide some elementary analysis tools in this notebook. The keyword being elementary.

Also, we’re using two popular packages from R for this one - EBImage and imager.

EBImage provides general purpose functionality for image processing and analysis, especially for researchers in the life sciences doing work with gene expression arrays, tissue samples etc. In fact, quoting from EBImage’s vignette:

In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.

See basic documentation and installation instructions for it,[here.] (https://www.bioconductor.org/packages/release/bioc/html/EBImage.html)

imager is a CRAN package for image and video processing functions. We’ll see a few on the way.

## Setup chunk

suppressPackageStartupMessages ({
  
  if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") }
  if (!require(EBImage)) { BiocManager::install("EBImage", version = "3.8")} 
  library(EBImage)
  
  if (!require(imager)){install.packages("imager")}
  library(imager)
  
})
## Warning: package 'imager' was built under R version 3.5.3
setwd = "."
getwd()
## [1] "C:/Users/ANISHA/Downloads/lec_4_final"

Change your setwd() path. There are some demo images put in the LMS folder. Let’s use them to:

# Read Image
library(EBImage)
Image1 <- readImage(".\\data\\isb.jpg")
Image2 <- readImage(".\\data\\jantar_mantar.jpg")

# What are images stored as? use 'print()'
print(Image1)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 1024 494 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]       [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.9372549 0.96470588 0.9333333 0.9450980 0.9843137 0.9803922
## [2,] 0.9764706 0.94901961 0.9960784 0.9725490 0.9450980 0.9490196
## [3,] 0.7490196 0.83921569 0.8352941 0.9176471 1.0000000 0.6588235
## [4,] 0.2078431 0.07843137 0.6196078 0.9843137 0.8235294 0.6823529
## [5,] 0.9686275 0.65882353 0.3294118 0.3882353 0.3176471 1.0000000
cat('\n')
print(Image2)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 700 410 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.2980392 0.2705882 0.2470588 0.2549020 0.3725490 0.2980392
## [2,] 0.2431373 0.3686275 0.3882353 0.2235294 0.3019608 0.3058824
## [3,] 0.2509804 0.2901961 0.3058824 0.2431373 0.1960784 0.3490196
## [4,] 0.2901961 0.2901961 0.2823529 0.3058824 0.2745098 0.2784314
## [5,] 0.2862745 0.3215686 0.3019608 0.3215686 0.3254902 0.2588235

Note above:

Images in R stored as 3-D arrays containing floating-point numbers. Each number represents a pixel value. 3 layers of pixels corresponding to primary colors Red, Green and Blue (RGB) are superimposed to create color effects at each point in the image.

The dimension of an image is given as [w x h x RGB]. In R the numbers range from 0-1 whereas in openCV and py, they range from 0-255.

Larger images have more pixels. Which of the two images above is larger? To make proper comparisons, perhaps its better to resize images to the same dimension? Try ?EBImage::resize coz imager also has a resize option, so must specify package.

We’ll later also use the regular summary func on the numeric arrays representing Images 1 & 2.

dim(Image1)[1:2]
## [1] 1024  494
# resize images to get to same shape. try '?resize'
Image1 = EBImage::resize(Image1, dim(Image1)[1], output.dim = dim(Image2)[1:2])
print(Image1)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 700 410 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]       [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.9372549 0.96470588 0.9333333 0.9450980 0.9843137 0.9803922
## [2,] 0.9764706 0.94901961 0.9960784 0.9725490 0.9450980 0.9490196
## [3,] 0.7490196 0.83921569 0.8352941 0.9176471 1.0000000 0.6588235
## [4,] 0.2078431 0.07843137 0.6196078 0.9843137 0.8235294 0.6823529
## [5,] 0.9686275 0.65882353 0.3294118 0.3882353 0.3176471 1.0000000
print(Image2)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 700 410 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.2980392 0.2705882 0.2470588 0.2549020 0.3725490 0.2980392
## [2,] 0.2431373 0.3686275 0.3882353 0.2235294 0.3019608 0.3058824
## [3,] 0.2509804 0.2901961 0.3058824 0.2431373 0.1960784 0.3490196
## [4,] 0.2901961 0.2901961 0.2823529 0.3058824 0.2745098 0.2784314
## [5,] 0.2862745 0.3215686 0.3019608 0.3215686 0.3254902 0.2588235
summary(Image1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.3255  0.6510  0.6155  0.9412  1.0000
summary(Image2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2549  0.5373  0.5035  0.7373  1.0000
# Display read images
EBImage::display(Image1)

EBImage::display(Image2)

# Plot data with 'hist()'
  hist(Image1) # <1 sec

Above, we see the number of pixels (along Y-axis) against their intensity (higher is more on 0-1 scale) for each of the tree RGB matrices. Higher intensity gives a lighter or brighter hue.

Because the Images are stired as numeric arrays, standard arithmetic ops like +, -, /, * and square-roots etc. are do-able on them.

Behold.

## Manipulating brightness
# Light
a <- Image1 + 0.4
print(a)
## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 700 410 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]      [,2]      [,3]      [,4]      [,5]     [,6]
## [1,] 1.3372549 1.3647059 1.3333333 1.3450980 1.3843137 1.380392
## [2,] 1.3764706 1.3490196 1.3960784 1.3725490 1.3450980 1.349020
## [3,] 1.1490196 1.2392157 1.2352941 1.3176471 1.4000000 1.058824
## [4,] 0.6078431 0.4784314 1.0196078 1.3843137 1.2235294 1.082353
## [5,] 1.3686275 1.0588235 0.7294118 0.7882353 0.7176471 1.400000
EBImage::display(a)

# Dark
b <- Image1 - 0.4
EBImage::display(b)

hist(b)  # see x-axis on how the values have shifted by 0.4

# 'combine()' forms a series or set of images
c <- EBImage::combine(Image1, Image2)
EBImage::display(c, all = TRUE)  # makes a series of connected images (frames)

### Arithmetic ops on image pixel values

# 'adding' two pictures into one
d <- Image1 + Image2
EBImage::display(d)

hist(d)

## Manipulating contrast - 'multiplying' pixel values
e <- Image1*0.5
EBImage::display(e)  # why's the image coming out darker? Hint: Sqrt of num <1 is > or < num?

f <- Image1*3
EBImage::display(f)

# overlay and contrast
img_comb = EBImage::combine(
  Image1,
  Image1 + 0.3,
  Image1 * 2,
  Image1 ^ 0.5
)

EBImage::display(img_comb, all=TRUE)

Being numeric arrays, images can be conveniently manipulated by any of R’s arithmetic operators. For example, we can produce a negative image by simply subtracting the image from its maximum value.

Behold.

## Building negatives of an image
img_neg = max(Image1) - Image1
EBImage::display( img_neg )

## Color modes
colorMode(Image1) <- Grayscale
print(Image1)
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 700 410 3 
##   frames.total : 3 
##   frames.render: 3 
## 
## imageData(object)[1:5,1:6,1]
##           [,1]       [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.9372549 0.96470588 0.9333333 0.9450980 0.9843137 0.9803922
## [2,] 0.9764706 0.94901961 0.9960784 0.9725490 0.9450980 0.9490196
## [3,] 0.7490196 0.83921569 0.8352941 0.9176471 1.0000000 0.6588235
## [4,] 0.2078431 0.07843137 0.6196078 0.9843137 0.8235294 0.6823529
## [5,] 0.9686275 0.65882353 0.3294118 0.3882353 0.3176471 1.0000000
EBImage::display(Image1)
## Only the first frame of the image stack is displayed.
## To display all frames use 'all = TRUE'.

# colorMode(Image1) <- Color #to return to color
colorMode(Image1) <- Color #to return to color

# Cropping
k <- Image1[100:300, 200:300,]
EBImage::display(k)

## New image file
writeImage(k, "./NewImage.jpg")   # check in wd
# Flip, Flop, Rotate, resize
l <- flip(Image1)
EBImage::display(l)

m <- rotate(Image1, 45)
EBImage::display(m)

n<- flop(Image1)
EBImage::display(n)

o <- EBImage::resize(Image1, 400)
EBImage::display(o)

One may ask - why bother with these simple ops like cropping or rotating or resizing etc when you can do them in, heck, MSPaint?

Good Q. What we saw above was merely a demo for one odd image. But what if we have collected not a few but many 100s of images?

Getting things into R or Py has the inherent advantage that we then make the task inherently programmable, scalable, flexible and integrate-able into larger work-flows.

There’s so much more we can do in image processing but I won’t go there for two reasons:

Quite a bit of theory and math will creep in which will be handled by other courses in later terms.
The really interesting bits of work with image processing all require some M/L and D/L workflows around them. That will come in good time later on.

For now, you’ve been introduced to the elementary aspects of image-handling once we’ve done DC on image data.

Scraping image search results with imager

Below I demo one neat application of imager - using it to scrape google image search results.

Good that we’ve covered basic web scraping functionality earlier. I’ll invoke rvest in what follows.

P.S. Works just the same for Bing etc also, BTW.

## Pulling images off of goog search pages

suppressPackageStartupMessages({ 
  
  if (!require(imager)){install.packages("imager")}
  library(imager)  # https://dahtah.github.io/imager/imager.html
  if (!require(rvest)){install.packages("rvest")}
  library(rvest)
  if (!require(magick)){install.packages("magick")}
library(magick)
})
## Warning: package 'magick' was built under R version 3.5.3
# Run a search query (returning html content)
search <- read_html("https://www.google.com/search?site=&tbm=isch&q=ISB")

# Grab all <img> tags, get their "src" attribute, a URL to an image
urls <- search %>% html_nodes("img") %>% html_attr("src") #Get urls of parrot pictures
length(urls)
## [1] 20
urls[1:4]
## [1] "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQf6wSJ-O7EoFZ7-e7aJz2ynuWKgOblVTUP4D0q9TdzOFSiq6YyvM0GzHs" 
## [2] "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS3K1ij6cQTuY7Mc59-QQVr1RU_gpm8HeygcYGXbozAuvolV0oKcoJEcJHu"
## [3] "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT06UKMhP8lO1tcqOZhIJe-oHkSMwdLf6Ed0YmL3E3b4GPf3sNmCCC2Zimq"
## [4] "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSqVYS5Bj4UEHlaNErQGbO5sEDZvWGND-SnkWxENdLyPCi-80d-enLkWG_O"

Check out if the original search string ‘https://www.google.com/search?site=&tbm=isch&q=ISB’ makes sense by directly typing it into a browser. Likewise, check if the urls in the ‘urls’ object make sense by direct-typing into browser address-bar.

Now that we have the image urls, it should be a simple matter to render them in R, no? Well, truth is that took me a while to figure out. Imager uses CImg standards and to import web scraped images into R, I took a roundabout route of writing them to some temp location and then re-reading from disk.

I’m displaying just 4 images. Using a ‘map_imageList’ func called map_il() that works like the ‘map_xyz’ funcs we have in the tidyverse.

Behold, below.

require(magick)
load.image1 <- function(url){
  test = image_read(url) %>% magick2cimg(., alpha="flatten")
  return(test)
}

system.time({ map_il(urls[1:4], load.image1) %>% plot })

##    user  system elapsed 
##    0.07    0.04    0.25

Well, I’ll signoff here.

I realize we haven’t done video loading and frame capture. Once the frames are captured, we can handle them like any other images. I haven’t yet decided to go with it or not coz of time concerns.

Ciao.

Sudhir